Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes.

نویسندگان

  • Anuphap Prachumwat
  • Wen-Hsiung Li
چکیده

Where did vertebrate genes come from? Here we address this question by analyzing eight completely sequenced land vertebrate genomes and six completely sequenced invertebrate genomes. Approximately 70% of the vertebrate genes can be found in the six invertebrate genomes with the standard homology search criteria (denoted as V.MCL), another approximately 6% can be found with relaxed search criteria, and an additional approximately 2% can be found in sequenced fungal and bacterial genomes. Thus, a substantial proportion of vertebrate genes (approximately 22%) cannot be found in the nonvertebrate genomes studied (denoted as Vonly). Interestingly, genes in Vonly are predominantly singletons, while the majority of genes in the other three groups belong to gene families. The proteins of Vonly tend to evolve faster than those of V.MCL. Surprisingly, in many cases the family sizes in V.MCL are only as large as or even smaller than their counterparts in the invertebrates, contrary to the general perception of a larger family size in vertebrates. Interestingly, in comparison with the family size in invertebrates, vertebrate gene families involved in regulation, signal transduction, transcription, protein transport, and protein modification tend to be expanded, whereas those involved in metabolic processes tend to be contracted. Furthermore, for almost all of the functional categories with family size expansion in vertebrates, the number of gene types (i.e., the number of singletons plus the number of gene families) tends to be over-represented in Vonly, but under-represented in V.MCL. Our study suggests that gene function is a major determinant of gene family size.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting CpG Islands and Their Relationship with Genomic Feature in Cattle by Hidden Markov Model Algorithm

Cattle supply an important source of nutrition for humans in the world. CpG islands (CGIs) are very important and useful, as they carry functionally relevant epigenetic loci for whole genome studies. As a matter of fact, there have been no formal analyses of CGIs at the DNA sequence level in cattle genomes and therefore this study was carried out to fill the gap. We used hidden markov model alg...

متن کامل

Evolutionary Transition of Promoter and Gene Body DNA Methylation across Invertebrate–Vertebrate Boundary

Genomes of invertebrates and vertebrates exhibit highly divergent patterns of DNA methylation. Invertebrate genomes tend to be sparsely methylated, and DNA methylation is mostly targeted to a subset of transcription units (gene bodies). In a drastic contrast, vertebrate genomes are generally globally and heavily methylated, punctuated by the limited local hypo-methylation of putative regulatory...

متن کامل

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Comparative Genomic Study Reveals a Transition from TA Richness in Invertebrates to GC Richness in Vertebrates at CpG Flanking Sites: An Indication for Context-Dependent Mutagenicity of Methylated CpG Sites

Vertebrate genomes are characterized with CpG deficiency, particularly for GC-poor regions. The GC content-related CpG deficiency is probably caused by context-dependent deamination of methylated CpG sites. This hypothesis was examined in this study by comparing nucleotide frequencies at CpG flanking positions among invertebrate and vertebrate genomes. The finding is a transition of nucleotide ...

متن کامل

Broadening Gene Pool of Rice for Resistance to Biotic Stresses Through Wide Hybridization

Variability in the cultivated germplasm for economic traits such as resistance to rice tungro virus, sheathblight, yellow stem borer, drought and salt tolerance is limited. This necessitated search for the genes in secondary and tertiary gene pool of genus Oryza. Fortunately, wild species are an important reservoir ofuseful genes for resistance to major disease, pest and tolerance t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 18 2  شماره 

صفحات  -

تاریخ انتشار 2008